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We present a theory for the reverse analysis on the sequence information of a single H/P two-letter random 
hetero-polymer (RHP) from its force-extension (/ — z) curves during quasi static stretching. Upon stretching 
of a self-assembled RHP, it undergoes several structural transitions. The typical elastic response of a hetero- 
polymeric globule is a set of overlapping saw-tooth patterns. With consideration of the height and the position 
of the overlapping saw-tooth shape, we analyze the possibility of extracting the binding energies of the internal 
domains and the corresponding block sizes of the contributing conformations. 



I. INTRODUCTION 



Within a last decade, a series of remarkable force-extension experiments was performed using Atomic Force Microscopy 
' (AFM). These experiments show that the elastic response of a single molecule is clearly related to the internal structure of 
the molecule. Force-extension profiles of single molecules such as DNA, RNA, synthetic polyelectrolytes, giant protein titin 
O ' and chromatin fibers show characteristic saw-tooth patterns, which are interpreted as successive 

, unfolding of internal domains. This is in agreement with theoretical studies^T^ [HI [3 fl4ll and computer simulations 

tlS, 16, 1,7.1 predicting that step-wise unfolding pattern can be seen from the unfolding of pearl necklace of polyelectrolytes in a 
poor solvents flslfT^l and protein models fisll . 
K*' ' In some polymer systems (specially, biopolymers and proteins), the intra-chain self-assembly produces secondary or tertiary 
structures and the elastic response reflects this structural hierarchy IS]. The AFM experiments show that a series of partial 
unfoldings of those collapsed structure occurs by applying an external force. When the elastic energy gain is comparable with 
increase of the potential energy, the extension increases abruptly by Sz. The resulting force-extension profile is rich and reflects 
the domain size responding to the applied force. Information on the sequence of the linear structure reveals on the force-extension 
curve. In this sense it is interesting to trace back the particular sequence structure of a given chain from the measured elastic 
, response. 

' In our previous studyElIil, we minimized the free energy at the given force (which mimics a constant force measurement 
experiments). The obtained minimum corresponds to the ground state or to the metastable states. At several characteristic values 
^ ■ of force, segments of linear chain in the collapsed phase unfold in the pattern of "plateaus" in / — z curve. However, these 
' ■ "plateaus" often correspond to the multiple conformational transitions going through different extensions z if domains have 
y . similar binding energies. Therefore sequence information is partly washed away under the constant force measurements. 
Q Another experimentally common, yet theoretically more challenging, set up of AFM measurement is performed by imposing 

Q , the distance and measuring the restoring force. Typically, the force-extension profile has a saw-tooth pattern. Each time an 
L| internal domain is pulled out, the contact with cantilever becomes loose resulting in a big drop of the measured force. Hence, 
. ^ this sequence information is more directly accessible by force-extension measurement when the distance is imposed. Then an 
arising question is, if it is possible to recover the information about the sequence of polymer from force-extension profiles. 
H \ For this purpose, we present theoretical frame work how to "read" the sequence information from the elastic response. We 
. . . 1 demonstrate the mapping of the force-extension profiles to the sequence information under the controlled displacement. We 
show that it is feasible to extract the composition of block sizes to some extent while the order of arrangement of those blocks 
still remains to be answered. 



II. GENERAL MODEL 



We consider a polymer chain of monomers, one end of which is fixed at a reference point (i.e. z = 0) and the other end is 
brought to the distance z from the reference point . The sequence consists of n/i of hydrophobic (h) blocks and Up of hydrophilic 
(p) blocks in an alternating order (rih = Up). The size of i-th hydrophobic (hydrophilic) segment is Nj^ i^f) and the sequence 
of the whole chain can be represented by a series of h- and p- blocks of sizes: {Nf, Nj^}. 
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FIG. 1: (a) Schematic picture of all possible conformations for a heteropolymer consisting of four h- and p-blocks of sequence 4p-4h-8p-6h- 
20p-6h-6p-6h, classified according to the number of the released p-blocks (nf). (b) The force-extension curves for the same sequence. The 
dashed-line is the force-extension curve following the minimal energy conformations according to Eq.0 with 7 = 9fcsT/6^ (t — 3) and the 
solid-line is the statistically averaged force-extension. 



We assume that hydrophobic segments have a tendency to collapse into a compact globule of radius a,; and these unit 
globules are not further stretchable. These single block globules can merge together into a larger globule leaving the connecting 
hydrophilic segments as loops on the surface of the large globule. 

The optimal conformation of the chain is obtained from the minimum free energy under the given extension z. The free energy 
consists of the two main contributions, the interaction energy of the collapsed h-blocks (globules) and the elastic part of the p- 
blocks (strings). For simplicity, here we assume that p-block strings have the elastic properties of ideal Gaussian chain (later we 



discuss more realistic Langevin chain model). For the chain of length Nf and size Zi, the elastic energy is F^ias 



f/Nf. The 



elastic part of the free energy comes from the released hydrophilic segment connecting two nearest globules. Loops (hydrophilic 
segments whose both ends are attached to aggregated globule) do not contribute to the total elastic energy. 

Initially the chain is fixed at the minimal extension zq [zq <^ Nb), all h-blocks belong to a large collapsed globule and only 
one end p-block is outside of this globule. As imposed distance z varies, the chain adapts its conformation in order to minimize 
the total free energy. Each conformation can be characterized by the numbers of the released p-blocks and the position of the 
released p-blocks as illustrated in Fig[na). The free energy of the conformation of which q-th p-block is released is written as: 



E 
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(1) 



where 5™ and denote the surface area and the radius of the globule consisting of fc, fc + 1, . . . , m-th h-blocks, respectively 
and 7 = fesTr^/fe^ is a surface tension with T being reduced temperature T = \T~9\/9. Similar equations can be written for the 

1 conformations of which one of 

m! 



conformations with the arbitrary number of the h-blocks. If there are rip p-blocks, there are Up 

the internal p-block is released. The number of conformations where m out of rip p-blocks are released is „pC„i = \ - 

The total number of conformation is = 2"p^^. In Fig^a), we show all 2^~^ = 8 possible conformations of a heteropolymer 
consisting of 4 p-blocks and h-blocks. The conformations listed along the vertical lines have the same number of released 
p-blocks {riP) but different grouping of h-blocks. 

In general, the free energy of each conformation is slightly different from each other. For any given extension z, there are 
several local energy minima with similar free energy Er- These conformations contribute to the thermodynamic properties of 
the force-extension relation with statistical weight of cxp{—Er/kBT). In order to plot the force-extension curve, all possible 
conformations at given z must be taken into account with this statistical weight. The statistical sum G{z) of all possible 
conformations at the displacement z is: 



G{z) = ^cxp 



Erjz) 
' knT 



(2) 
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FIG. 2: The probability distribution of a sequence 3p-6h-5p-5h-3p-3h-2p-2h-3p-10h-3p-3h in the space of all possible conformations with 
7 = AksT/b^ (r — 2). x-axes is the given extension z, y-axes is the index of the each conformations defined in the similar way illustrated in 
FigQ The grey scale bar in the left hand side shows the probability scale, (b) The corresponding force-extension curve. The symbols (o and 
+) represent the fitting results using Eq[T0| 



The restoring force acting on the polymer chain is; 

f = -keT ^ ' . (3) 
oz 

In FigQlb) we show a force-extension curve calculated for a randomly chosen sequence 4p-4h-8p-6h-20p-6h-6p-6h. For 
convenience, we choose S™' ~ ^^(Si"=fc -^i'/''')^^'^ ^™ ~ fefV"' Nl^ h^'^/'^ lVk . The dashed-line represents the force 
obtained from minimal energy conformation for each extension z. If fluctuation is negligible, the expected force-extension curve 
is a sharp saw-tooth pattern shown as dashed-line in FigfOb). Each transition from one conformation to another is captured as 
a "drop" of a restoring force, which indicates the minimum energy conformation switches into the different conformation. The 
force increases with the extension until the next "drop". The force between the "drops" is proportional to ^ z/Np [Np is a sum 
of free p-blocks). The longer the chain is, the easier to stretch it. 

The solid line is the force-extension curve obtained from Eq|2]and Eq|3lwhere all local energy minima conformations are also 
taken into account with proper statistical weight. The unfolding of the large globule follows the path illustrated in the Fig[na). 
Release of each unit globule leads to a jump. The height of each jump becomes smaller as the overall globule size becomes 
smaller so that surface energy difference before and after the release becomes smaller We note that one of the transitions is 
between conformations with the same number of released p-blocks. (2nd jump from conformation 2 to the conformation 3.) 

Another example of force extension curve for a different randomly chosen sequence(6 p-blocks and 6 h-blocks of different 
lengths) is shown on Fig|2l Here the probability to be at each conformation i (given by Eq.Q) is shown in the space of all 
possible 2"^ conformations, (see Fig|2ja)). The dark region indicates the favorable conformations under given constraint (fixed 
z). In some range of z, there are several conformations with similar statistical weight. The transition from one group of 
conformations to another on the Fig|3a) near z = 20 does not result in any noticeable feature on (Fig|3b)). There are visible 
only few first jumps corresponding to the conformational transitions. 

Why only two or three transitions are visible on the force-extension curve? At the jump, dominant conformation shifts from 
one to the other. If there is a clear favorite conformation, the transition is sharp. Otherwise, if several conformations contribute 
with similar weights, transitions are not expected to be captured as a clear saw-tooth shape in the quasistatic measurement and 
the fluctuation around average force is large. Around each transition there is a region of strong fluctuations, Szn where difference 
in energy of competing conformations is smaller than ksT. For the n-th transition this region is about (5z„ ~ fcBTz*/e„. Here 
z* is n-th transition point and e„ is the binding energy related to this transition. The size of fluctuation region is typically 
growing with n because z* oc n. We should note that the binding energy £„ can not be much larger than fceT, otherwise it is 
difficult to perform quasistatic experiment. It means that after several transitions n « Sn/k-QT their fluctuation regions should 
overlap: 5z k, z* — z*^i and the typical zigzag pattern of each transition starts overlapping with that of neighboring transitions. 
Fig|2(a) demonstrate such smooth f ~ z curve after a few initial jumps. At large extension, when all loops are pulled out, force 
increases monotonically with extension. 

In realistic experimental situations, one chain end is pulled with a small but finite speed. The free energy difference SE^ 
between the dark and bright conformation gives typical relaxation time ^ e^^ /fesT Jqj. jj^g transition between two likely 
conformations. Depending on the pulling speed, certain energy barrier conformations are overcome but some of them are not. 
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Conformations separated by the large energy barrier do not contribute the f — z when the pulhng speed is faster than the chain 
relaxation time. Thus, the accessible conformation can be controlled by pulling rate and this allows extracting more detailed 
information about the structure of polymer We shall address this question in the future publication. 



III. READING THE SEQUENCE INFORMATION FROM F-Z CURVE 

Simple Model In the following, we show how to extract the chain sequence information from the force-extension curve. In 
order to do so, we further simplify the conformational space. As illustrated in Fig|3| we assume that globules are arranged in 
1-d and interact only with neighboring globules. We denote Sm as the interaction energy between m-th and m + 1-th h-block 
globules. The transition in conformations is related only to the releasing of a unit globule-loop pair from a larger globule. We 
will show that the interaction energy, can be extracted from the analysis of the force extension curve (see Fig^. More 
realistic assumption would be that all aggregated globules m — 2, m — 1, m interact with m + 1-th globule. In this case the 
energy £,„ depends on the arrangement of globules. 

In 1-d model, each conformation is completely characterized by two sets of variables: {£,„} and {/,„}, where e„i = 
7 [S""" - (S'f + S2+i)] , is a surface area of globule consisting of iV^, iV^+i, 7V^, h-blocks and l„, is the length of 
the m— th p- block = N^^b. In the absence of an external force, all h-globules are attached and aligned in one line. With 
the increase of the applied distance z, the contacts between h-globules break one after another In the force-extension curve, 
these events are represented as "drops" in force. The phenomenological knowledge of z-coordinate of the jump (denoted as z* 
below) and its magnitude A/ allows to determine Em and Im, uniquely. At the conformational transition of releasing m + 1— th 
loop, where z — z*, the energies of two conformations should be equal: Em — £^in+i. This leads to the following relation in 
fcsT units, 

+ em+1 (4) 



where is the total linear length of the chain before the transition. In this relation we assume that the p-block segments are 
much longer than the size of the collapsed h-blocks. Otherwise, the size of the hydrophobic globules becomes relevant as an 
offset of elastic energy of the chain. Then Eql^reads 

{z-2a--r {z-2{aT + a2+i)r ^ 

f ~ f Tl ^m-\-l P) 

J^rii -^m ~r f"m-\-l 

One can relate the height of this jump A/ = /,„ - = dE,n/dz - dE,n+i/dz = [2z/L,„ - 2z/ {L„i + l,n+i)]{kBT/b) 

with Em+i- 

£m+l = (6) 

Similarly, the length Im+i can be extracted from the slope difference before and after the jump l//,„+i — l//,„ = Im+i/'i.z: 

2A/Z ksT 

hn+l — 7 7 7 — ') 

Jm+ljm 

We notice that all inclined parts of the curve, if continued, have zero intercept. The order of releasing is determined by either 
the minimal interaction energy of h-globules £,„ among all remaining Sk or the maximal length of the p-segment Ik- If all blocks 
are of similar size with small variations Ssk, 5lk around the average values e and /, then Lm « z and from Eq.© we can get the 
condition of releasing the next segment k as determined by the largest relative variation max{max(^^), max(^)}. 

Reading thermally averaged f-z curves In the vicinity of the m-th transition point z*„ (below we simplify as z*) where 
(mH-l)-th loop is released, the difference between the energies of two states can be small and comparable to k-gT. Because of 
thermal fluctuations, the actual force-extension curve can be very noisy. If these fluctuations are properly averaged, the edge of 
sharp saw-tooth is rounded, (see Fig|3}. We will show below, that parameters £m and lm can be extracted from rounded curve 
too. 

In the absence of thermal fluctuations we describe the jump in a force-extension curve in Fig|3lusing well-known step-function 
((^-function): / (z) = /,„ (z) 9 (z* — z) + fm+i {z) 6 {z — z*), where fm (z) , fm+i (z) are force-extension curves before and 
after the j ump (/„! (z) = dEm (z) /dz). In order to include the rounding effect of thermal fluctuations we will replace (^-function 
in this equation by thermally averaged function 9: 

^-E^(z)/ksT 

(^{z - z) = ^_E^(^^)/k^T _|_ g-E^ + i{z)/kBT " 1 + e-(-E™+i(^)--E™(z))/fcBT 
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FIG. 3: The simple model of copolymer, (a) Typical f-z curve and (b) fluctuations in / around a transition point. 



At the transition point, where £',„+! (z) — i?,,, (z) ~ 0, thermally averaged function is6'(z* — z) = l/2. In the vicinity of the 
transition we can interpolate difference £',„+i (z) — Em (z) as 2e„i+i(z* — z)/z*. Finally we obtain; 

Hz*-z) = (9) 

1 + e ''bt 

and the fitting function for a transition is: 



/ (z) = /,„ (z) e {z* -z) + U+1 (zm) 0{z~- z*) . (10) 

There are three independent variables controlling the shape of a single saw-tooth jump (see Fig|3j: slopes before and after 
transition and the location of the transition z*, the same number of independent variables is needed for fitting of the thermally 
averaged curves. Each additional transition requires two additional variables for its description: 2-coordinate of transition and 
the slope after the transition, which can be related to Im and Em through Eqs|6land0 

If two or more transitions are close to each other, then it might be difficult to determine the slope of the force-extension curve 
in the regions between these transitions, especially with presence of noise. Here we present fitting functions for two overlapping 
transitions, the further generalization for multiple transitions is obvious. The combined fitting functions for two transitions 
can be symbolically written with the use of 9 functions as: f(z) = f„i (z) 6 (z*„ — z) + fm+i (z) 9 {z ~ z*„) 6 (z*j^_f_-^ — z) + 

/m+2 (z)^(2 - <„+l)- 

Here the averaged product of two 9 functions represents 



^{z Zm)^'(<n+l ^) - g-_E„/feBT ^ g-Bm + i/fceT ^ g-iJm + 2/fcBT- ^^^^ 

It should be noted that the locations of the released loops of both transitions are not necessarily next to each other along the 
chain. After global optimization over fitting parameters, we produce the best estimate for this circumstance. If transitions are 
too close to each other ((z*„ — Zm+i)/z < k-^T /2em), the fitting curve gives better estimate of the sum of energies em+i + 
£m+2 and lengths Im+i + lm+2, but not estimate of these quantities by themselves. 

The symbols in Fig|2jb) (o and +) represent the fitting results of the function, Eq[ro| When the first h-block is released the 
unknown parameters are the transitional point z* and the lengths of the released p-blocks before and after the transition: li and 
li + l2- Notice, that in the case when the total size of globules on the string before and after event, a„i and a,„+i, are not small 
one should consider them as additional fitting parameters, so that force has a form fm = ^(^"""^^bT ^ r^^^ y^^^^ obtained 
with parameters z* ^ 8.35, li ~ 3.06 and h +I2 ^ 8.06, ai ~ 5.26 and 0:2 ~ 7.26. When the second h-block is released (+) 
z*, h + h and li +I2 + h are unknown. From the second event, we obtain, z* ^ 13.46, 11+I2 ^ 9.16 and I1+I2 + I3 ^ 11.06, 
a2 ~ 6.36 and as ^ 10.36. 

After all we have li = 3.06, I2 = 5.06 and ^3 = 1.96, which are in agreement with the exact values for 
p-blocks 3,5,3 accordingly. The estimated interaction energy difference before and after event from Eq|3 e,„ = 
{fm {z* — a„i) ~ fm+i {z* ~ am+i)) f'^, are Aei/fceT ~ 3.1 and Ae2/kBT 4.7. The estimated interaction 
strengths are in agreement with the calculated values Ei/k^T = ^/k^T ({SD + {SD - {SD) = 4.9 and £2/^81" = 
^/kBT{{St) + {Sl)-{Sl))=A.i. 
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Matching the noise pattern The above fitting was done to the thermodynamically averaged transition curve. In practice this 
curve can be quite noisy especially in the transition region, because system fluctuates between two different configurations with 
similar energies and time averaging could be costly. It makes sense to measure the noise directly as a function of extension z 
and try to extract structural information from it. Calculating the average mean-square magnitude of thermal noise we get: 



{j{z)-!{z)Y=d{z* -z).d{z-z*) 



4e 



m+l 



(12) 



This function is the product of two thermally averaged d function defined in Eq|9] and is sharply peaked as is shown on the 
second inset of Fig|3jb). 

Langevin chain For the practical application, we consider the Langevin chain (with fixed bond length) for which the chain 
extension is given by the following Langevin equation. 



zjL ■ 



coth 



(13) 



In the limit of strong stretching, this equation can be simplified to Jb/kBT ~ 1/(1 — z/L) and for weak stretching limit, it 
reproduces the linear response behavior fb/kBT ~ z/L. We can assume, that before and after transition point, the f — z curve 
is described by strong and weak stretching behavior, respectively. Than instead of EqQwe have: 



m+l 



1 



l-ksT/fmb fm+lb/kBT 



(14) 



This reading method can be applied to the experimental curve of the protein domain unfolding where each saw-tooth (jump) 
corresponds to the unraveling of a single domain. We do not try to fit the detail shape of the curve which often treated as 
worm-like-chain model. We note that the position of peaks and the depth of the jump can be directly mapped into our 1-dim 
globule-string model. We may map the number of monomers in the each domain into the connecting p-block size in our model 
because after the unfolding of each domain, the extension increases by the length corresponding domain size. The binding 
energy of the each domain is now the interaction energy between two h-globules, i.e Em- 



IV. CONCLUSIONS 



We demonstrated that some structural information of heteropolymers can be extracted from the force-extension curves using 
the simple model. In this work, we assume that the process of pulling is so slow thus system is always in thermodynamic 
equilibrium. This means all possible conformations can contribute to the elastic responses with appropriate thermodynamic 
weight. In the future publication we will report the effect of finite puUing rate where accessible number of configurations is 
controlled by the pulling rate. 
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